1 1 3 M ar 2 00 0 TnT — A Statistical Part - of - Speech Tagger

نویسنده

  • Thorsten Brants
چکیده

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TnT -- A Statistical Part-of-Speech Tagger

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of...

متن کامل

Unigram Backoff vs. TnT Evaluating Part of Speech Taggers Introduction to Computational Linguistics

Automated statistical part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. This paper introduces and analyzes the performance of two part-of-speech taggers, namely the NLTK unigram backoff tagger and the TnT tagger, a trigram tagger. Experimental results show that the TnT tagger outperforms the NLTK unigra...

متن کامل

The Open Source Tagger HunPoS for Swedish

HunPoS, a freely available open source part-of-speech tagger—a reimplementation of one of the best performing taggers, TnT—is applied to Swedish and evaluated when the tagger is trained on various sizes of training data. The tagger’s accuracy is compared to other data-driven taggers for Swedish. The results show that the tagging performance of HunPoS is as accurate as TnT and can be used effici...

متن کامل

Tnt Tagger for Malayalam with Fuzzy Rule Based Learning

TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag ...

متن کامل

Tagging the Dutch PAROLE Corpus

We discuss the annotation with part of speech and lemma of the Dutch PAROLE Internet Corpus. The PAROLE PoS tagger is a combination of statistical taggers. It includes the Markov tagger TnT and 3 taggers developed at the INL with the purpose of using other information besides the training data. Lemma is assigned by a deterministic procedure, based on an extensive lexicon. The output is in some ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000